Clique percolation in random networks 
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The notion of fc-clique percolation in random graphs is introduced, where k is the size of the complete 
subgraphs whose large scale organizations are analytically and numerically investigated. For the Erdos-Renyi 
graph of N vertices we obtain that the percolation transition of fc-cliques takes place when the probability of 
two vertices being connected by an edge reaches the threshold p c (k) = [(k — lJJV) -1 ' At the transition 
point the scaling of the giant component with N is highly non- trivial and depends on k. We discuss why 
clique percolation is a novel and efficient approach to the identification of overlapping communities in large real 
networks. 

PACS numbers: 02.10.Ox, 89.75.Hc, 05.70.Fh, 64.60.-i 



There has been a quickly growing interest in networks, 
since they can represent the structure of a wide class of com- 
plex systems occurring from the level of cells to society. Data 
obtained on real networks show that the corresponding graphs 
exhibit unexpected non-trivial properties, e.g., anomalous de- 
gree distributions, diameter, spreading phenomena, cluster- 
ing coefficient, and correlations (jj, y|, yl |4|, |5J. Very re- 
cently great attention has been paid to the local structural 
units of networks. Small and well defined subgraphs have 
been introduced as "motifs" Their distribution and clus- 
tering properties 0. 01 can be used to interpret global 
features as well. Somewhat larger units, made up of ver- 
tices that are more densely connected to each other than to 
the rest of the network, are often referred to as communities 
HE3E3ElElEllilEl, and have been considered to 
be the essential structural units of real networks. They have 
no obvious definition, and most of the recent methods for their 
identification rely on dividing the network into smaller pieces. 
The biggest drawback of these methods is that they do not al- 
low for overlapping communities, although overlaps are gen- 
erally assumed to be crucial features of communities. In this 
Letter we lay down the fundamentals of a kind of percolation 
phenomenon on graphs, which can also be used as an effec- 
tive and deterministic method for uniquely identifying over- 
lapping communities in large real networks 11711 . 

Meanwhile, the various aspects of the classical Erdos- 
Renyi (ER) uncorrected random graph flill remain still of 
great interest since such a graph can serve both as a test bed 
for checking all sorts of new ideas concerning complex net- 
works in general, and as a prototype to which all other random 
graphs can be compared. Perhaps the most conspicuous early 
result on the ER graphs was related to the percolation transi- 
tion taking place at p = p c = l/N, where p is the probability 
that two vertices are connected by an edge and N is the total 
number of vertices in the graph. The appearance of a giant 
component, which is also referred to as the percolating com- 
ponent, results in a dramatic change in the overall topological 
features of the graph and has been in the center of interest for 
other networks as well. 

In this Letter we address the general question of subgraph 



percolation in the ER model. We obtain analytic and simu- 
lation results related to the appearance of a giant component 
made of complete subgraphs of fc vertices (fc-cliques). In par- 
ticular, we provide an analytic expression for the threshold 
probability at which the percolation transition of fc-cliques 
takes place. The transition is continuous, characterized by 
non-universal critical exponents, which depend on both k and 
the way the size of the giant component is measured. Our ana- 
lytic calculations are in full agreement with the corresponding 
numerical simulations. 

Before we proceed to calculate the threshold and the expo- 
nents we need to outline some basic definitions, k-cliques, the 
central objects of our investigation, are defined as complete 
(fully connected) subgraphs of fc vertices 1 19]. As an illustra- 
tion, in Fig. the 3-cliques (triangles) are emphasized with 
either black or dark gray edges. We also introduce a few new 
notions specific to our problem, (i) k-clique adjacency: two 
fc-cliques are adjacent if they share fc — 1 vertices, i.e., if they 



o 









o 





FIG. 1: Sketches of two ER graphs of N = 20 vertices and with 
edge probabilities p = 0.13 (left one) and p = 0.22 (right one, gen- 
erated by adding more random edges to the left one). In both cases all 
the edges belong to a "giant" connected component, because the edge 
probabilities are much larger than the threshold (p c = l/N = 0.05) 
for the classical ER percolation transition. However, in the left one p 
is below the 3-clique (triangle) percolation threshold, p c (3) ss 0.16, 
calculated from Eq. Q, therefore, only two small 3-clique perco- 
lation clusters (distinguished by black and dark gray edges) can be 
observed. In the right graph, on the other hand, p is above this thresh- 
old and, as a consequence, most 3-cliques accumulate in a "giant" 
3-clique percolation cluster (black edges). This graph also exhibits 
an overlap (half black, half dark gray vertex) between two 3-clique 
percolation clusters (black and dark gray). 
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differ only in a single vertex, (ii) k-clique chain: a subgraph, 
which is the union of a sequence of adjacent fc-cliques. (iii) fc- 
clique connectedness: two fc-cliques are fc-clique-connected 
if they are parts of a fc-clique chain, (iv) k-clique percolation 
cluster (or component): it is a maximal fc-clique-connected 
subgraph, i.e., it is the union of all fc-cliques that are fc-clique- 
connected to a particular fc-clique. This is illustrated in Fig. [2 
where both graphs contain two 3-clique percolation clusters, 
the smaller ones in dark gray and the larger ones in black. We 
note that these objects can be considered as interesting spe- 
cific cases of the general graph theoretic objects defined in 
Refs. [20] and |21] in very different contexts. 

A fc-clique percolation cluster is very much like a regular 
(edge) percolation cluster in the k-clique adjacency graph, 
where the vertices represent the fc-cliques of the original 
graph, and there is an edge between two vertices if the corre- 
sponding fc-cliques are adjacent. Moving a particle from one 
vertex of this adjacency graph to another one along an edge 
is equivalent to rolling a k-clique template from one fc-clique 
of the original graph to an adjacent one. A fc-clique template 
can be thought of as an object that is isomorphic to a com- 
plete graph of fc vertices. Such a template can be placed onto 
any fc-clique of the original graph, and rolled to an adjacent 
fc-clique by relocating one of its vertices and keeping its other 
fc — 1 vertices fixed. Thus, the fc-clique percolation clusters of 
a graph are all those subgraphs that can be fully explored but 
cannot be left by rolling a fc-clique template in them. 

Now, we present a general result for the threshold probabil- 
ity (critical point) of fc-clique percolation using heuristic ar- 
guments. We find that a giant fc-clique component appears in 
an ER graph (as illustrated for fc = 3 in Fig.0 at p = p c {k), 
where 

Pc(k) = (1) 

[{k-l)N} — 

Obviously, for fc = 2 this result agrees with the known perco- 
lation threshold (p c = l/N) for ER graphs, because 2-clique 
connectedness is equivalent to regular (edge) connectedness. 
Expression Q can be obtained by requiring that after rolling a 
fc-clique template from a fc-clique to an adjacent one (by relo- 
cating one of its vertices), the expectation value of the number 
of adjacent fc-cliques, where the template can roll further (by 
relocating another of its vertices), be equal to 1 at the per- 
colation threshold. The intuitive argument behind this crite- 
rion is that a smaller expectation value would result in prema- 
ture fc-clique percolation clusters, because starting from any 
fc-clique the rolling would quickly come to a halt and, as a 
consequence, the size of the clusters would decay exponen- 
tially. A larger expectation value, on the other hand, would 
allow an infinite series of bifurcations for the rolling, ensuring 
that a giant cluster is present in the system. The above ex- 
pectation value can be estimated as (fc — l)(N — fc — l)p k ~ 1 , 
where the first term (fc — 1) counts the number of vertices 
of the template that can be selected for the next relocation, 
the second term (N — fc — 1) counts the number of potential 
destinations for this relocation, out of which only the fraction 



p k ~ x is acceptable, because each of the new fc — 1 edges (asso- 
ciated with the relocation) must exist in order to obtain a new 
fc-clique. For large N, our criterion can thus be written as 
(fc — l)A?Pc 1 = 1> from which we get expression {0 for the 
threshold probability. The above heuristic approach is similar 
in spirit to the one used in Ref. [22] in the context of standard 
percolation on networks. 

It is important to point out that this result can be made 
stronger by a more detailed derivation which we shall present 
elsewhere due to space limitations. In short, starting from the 
distribution of the number of fc-cliques adjacent to a randomly 
selected one, and applying the so-called generating function 
formalism [23], one can derive the generating function of the 
distribution of the number of fc-cliques that can be visited 
from a randomly selected one. This function diverges as p 
approaches p c (k) from below, signaling the threshold for per- 
colation. Furthermore, our result for p c (k) is also in perfect 
agreement with the numerical simulations (see below). 

There are two plausible choices to measure the size of the 
largest fc-clique percolation cluster. The most natural one, 
which we denote by N* , is the number of vertices belong- 
ing to this cluster. We can also define an order parameter 
associated with this choice as the relative size of that cluster: 

$ = N*/N. (2) 

The other choice is the number Af* of fc-cliques of the largest 
fc-clique percolation cluster (or equivalently, the number of 
vertices of the largest component in the fc-clique adjacency 
graph). The associated order parameter is again the relative 
size of this cluster: 

9=Af*/Af, (3) 

where Af denotes the total number of fc-cliques in the graph 
(or the total number of vertices in the adjacency graph). Af 
can be estimated as 

m « (^)/ (fc " 1)/2 « iJr^ (fc ~ 1)/2 ' W 

because fc different vertices can be selected in rf) different 
ways, and any such selection makes a fc-clique only if all the 
fc(fc — l)/2 edges between these fc vertices exist, each with 
probability p. Note that the classical ER percolation is equiv- 
alent to our fc = 2 case, and the ER order parameter (relative 
number of edges) is identical to Also note that in general 
the size of the largest cluster could be measured as the number 
of its Z-cliques, Afny for 1 < I < fc. However, for simplicity 
we restrict ourselves to the two limiting cases (N* = 
and Af* = -A/j^)) defined above. 

Our computer simulations indicate that the two order pa- 
rameters behave differently near the threshold probability. To 
illustrate this, in Figs. and|3^ we plotted <£> and respec- 
tively, as a function of p/p c (k) for fc = 4 and for various 
system sizes (N), averaged over several runs. 

The order parameter $ for fc > 3 converges to a step 
function as N — > oo. The fact that the step is located at 
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FIG. 2: Simulation results for the order parameter $ averaged over 
several runs (the statistical error is smaller than the symbol size), (a) 
The convergence of $ as a function of p/p c (k) to a step function in 
the TV — > oo limit is illustrated for k — 4. (b) The width of the steps 
follows a power law, ~ N~ a , as the steps collapse onto a single 
curve if we stretch them out by TV" horizontally. The data for k — 4 
and 5 are shifted upward by 0.4 and 0.8, respectively, for clarity. 



p/p c (k) — 1 is actually the numerical proof of the validity 
of our theoretical prediction Q for p c (k). The width of the 
steps follows a power law, ~ TV~ Q , with some exponent a. 
Plotting $ as a function of \p/p c {k) — 1]TV Q , i.e., stretching 
out the horizontal scale by TV", the data collapse onto a single 
curve. This is shown for k = 3, 4, and 5 in Fig. |2j). The expo- 
nent a seems to be around 0.5 for k > 3. Although for k = 3 
a slight deviation form a = 0.5 has been obtained, we cannot 
distinguish that from a possible logarithmic correction. 

The order parameter <i? for k > 2, on the other hand, simi- 
larly to the classical ER transition, converges to a limit func- 
tion, which is for p/p c (k) < 1 and grows continuously from 
to 1 if we increase p/p c (k) from 1 to oo. 

One of the most fundamental results in random graph the- 
ory concerns the behavior of the largest component at the per- 
colation threshold, where it becomes a giant (infinitely large) 
component in the TV — > oo limit. Erdos and Renyi showed 
fljll that for the random graphs they introduced, the size of the 
largest component J\f* (measured as the number of its edges) 
at p = p c = 1/TV diverges with the system size as TV 2 / 3 , or 
equivalently, the order parameter <i? scales as TV -1 / 3 . Since 
the giant component at the threshold has a tree-like struc- 
ture, its number of vertices, TV*, also diverges as TV 2 / 3 . We 
shall show that similar scaling behavior can be obtained for 
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FIG. 3: The order parameter \t for the same simulations as in Fig. 
[2] (a) As illustrated for k — 4, $ as a function of p/p c (k) converges 
to a limit function (which is for p/p c (k) < 1 and grows continu- 
ously to 1 above p/p c (k) = 1) in the TV — > 00 limit, (b) The order 
parameter at the threshold, >l/ c , scales as some negative power of TV, 
in good agreement with expression {6j. 

fc-clique percolation at the threshold probability p c {k). 

If we assume, that the fc-clique adjacency graph is like an 
ER graph ll24ll . then at the threshold the size of its giant com- 
ponent Af* scales as Nc . The subscript "c" throughout this 
Letter indicates that the system is at the percolation thresh- 
old (or critical point). Plugging p = p c from Expression Q 
into Eq. (0} and omitting the TV-independent factors we get 
the scaling 



7V C - TV fc / 2 



(5) 



for the total number of fc-cliques. Thus, the size of the giant 
component A/"* is expected to scale as A/"c ^ ~ TV fc / 3 and the 
order parameter * c as Nc /3 /Af c ~ N~ k/G . 

This is valid, however, only if k < 3. The reason for the 
breakdown of the above scaling is that for k > 3 it predicts 
that the number of fc-cliques of the giant fc-clique percolation 
cluster, i.e., the number of vertices of the giant component in 
the fc-clique adjacency graph, J\fc ^ ~ TV fc / 3 , grows faster 
than TV. On the other hand, in analogy with the structure of 
the giant component of the classical ER problem, we expect 
that the giant component in the adjacency graph also has a 
tree-like structure at the threshold, with very few loops. As 
a consequence, almost every vertex of the adjacency graph 
corresponds to a vertex of the original graph. Thus, in the 
adjacency graph the giant component should not grow faster 
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than N at the threshold. Therefore, for fc > 3 we expect that 
N* - N, and using Eq. 0, * c = N*/N c ~ N 1 ^' 2 . In 
summary: 



N- k / 6 for k < 3 
jyl-Jfe/2 for k > 3 



(6) 



We have also determined the scaling of ^ c at p c as a function 
of AT numerically, and the results are in good agreement with 
the above heuristic arguments, as shown in Fig.|5j5. 

Finally, we discuss the relevance of our approach to com- 
munity finding 

S[l0l[HlH[llEl[l!. If we start rolling a 
fc-clique template from a highly connected part of a network 
we can proceed and find all vertices that can be reached from 
the initial fc-clique. Such a fc-clique percolation cluster can be 
identified as a community, because of the many (at least k — 1) 
links of any of its vertices to the other vertices of this cluster. 
The links are organized into complete subgraphs (fc-cliques), 
which is also a characteristics of most communities (just think 
of human relations). With different values of k we can iden- 
tify communities of different strength (or cohesiveness). Our 
fc-clique percolation clusters also satisfy a number of basic 
requirements (local; density based; not too restrictive; have 
no cut-node; allow overlaps) that are expected from a com- 
munity definition, but are not satisfied simultaneously by any 
other existing definition in the literature 1 20, 25]. Although 
using fc-cliques might seem to be a very strict constraint on 
the community definition, we note that relaxing this constraint 
(e.g., by allowing incomplete fc-cliques) is practically equiva- 
lent to lowering the value of fc. 

The sharp percolation transition (step in <£>) of the ER 
graphs provides the theoretical basis for the applicability of 
our community definition to real networks. This is because if 
the network was completely random, only very few and small 
clusters would be expected for any fc at which the network 
is below the transition point. However, if large clusters do 
appear, they must correspond to locally dense structures, i.e., 
real communities. Moreover, since these communities are lo- 
cally above the percolation threshold, their identification is 
immune to random removal of edges as long as their edge 
density remains above the threshold. 

The most important aspect of such a method is that natu- 
rally, a single vertex can be part of several communities I17II . 
as illustrated in Fig.[2(right) by the half black, half dark gray 
vertex. In terms of a person, he/she can belong to a number 
of groups (of highly connected people) in such a way that no 
two groups share a (fc — l)-clique (there are no fc — 1 people 
in any two groups who would all know each other and, there- 
fore, would allow a fc-clique template to roll through). Thus, 
each vertex can belong to a number of individually identifi- 
able communities and, in turn, each community can have a 
large number of contacts with other communities, just as it 
happens in most realistic situations (see, e.g., Ref. 1 26]). This 
is very much in contrast with the divisive and agglomerative 
methods, which force each vertex to belong to only one com- 
munity and be separated from the others, leading to the loss 



of many of the communities of the network. 

The approach presented in this Letter allows a number of 
generalizations (e.g., fc-cliques connected through (fc — l)- 
cliques, fc-cliques with weighted edges, etc.) and opens new 
directions in the study of network structures made of highly 
interconnected parts including communities overlapping in 
various non-trivial ways. As an important biological example, 
we have successfully applied our method to the identification 
of protein communities in the protein-protein interaction net- 
work of yeast, which has allowed us to make predictions for 
the yet unknown function of some proteins 11711 . 

This work has been supported in part by the Hungarian Sci- 
ence Foundation (OTKA), grant Nos. F047203 and T034995. 
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